114 ◾ Bioinformatics
The studies on the consequences of variants focus on understanding the molecu-
lar mechanisms and pathways that link a genotype to a phenotype. This kind of studies
interpret the consequences of the variants on the protein function. Simple base substitu-
tion such as missense, stop gained, and stop lost variants can alter the translated protein
sequence, causing functional consequences. Moreover, functional consequences due to
structural variants are usually defined by the physiological phenotypes observed. These
can be complex descriptions which are described using general phenotypic traits rather
than specific biochemical effects caused by the variant. Clinical functional consequences
are represented by a simple controlled vocabulary that defines the relative pathogenicity
of a variant, such as benign, likely benign, uncertain significance, likely pathogenic, or
pathogenic.
The studies of population genetics are the studies of variation within populations of
individuals and the forces that shape it. This usually involves studying changes in frequen-
cies of genetic variation in populations over space and time. Some of the major forces that
shape variation in natural populations are mutations, selection, migration, and random
genetic drift. When a new mutation occurs, it may be beneficial to the organism, deleteri-
ous (harmful) to the organism, or it can be neutral (have no effect on the fitness of the
organism). Indeed, beneficial and deleterious mutations are subject to natural selection,
typically leading to increases and decreases in their allele frequency, respectively. Allele
frequencies are also influenced by the random genetic drift. This process explains the fluc-
tuation in allele frequencies from one generation to another.
4.2 VARIANT CALLING PROGRAMS
There are several programs for variant calling using different variant calling algorithms.
The most commonly used variant calling programs are categorized into two groups: con-
sensus-based callers like BCFTools mpileup and haplotype-based callers like FreeBayes
and GATK HaplotypeCaller. In the following, we will discuss these two types of variant
callers with some examples. We will assume that the FASTQ files used in the exercise are
preprocessed and clean as explained in Chapter 1.
4.2.1 Consensus-Based Variant Callers
The consensus-based variant callers depend on the pileup of the aligned reads covering a
position on the reference sequence to call the variants (SNVs or InDels). The read align-
ment information is in SAM/BAM file. We can then check the pileup of all bases of the
reads covering a reference base position. In most cases, the bases covering that position
will be the same as the base of the reference sequence, but in the case of variants, the bases
will be different from the reference base. The consensus sequence is created by collapsing
bases on all position and choosing the most frequent bases. In some positions, there may be
differences between the sequence of the reference genome and consensus sequence. These
differences can also be due to errors; however, when there is a sufficient sequencing depth,
that will provide sufficient confidence to call the variants. Figure 4.2 shows a diagram for
reads aligned to the sequence of a reference genome and a consensus sequence formed by